45 research outputs found

    MedChatZH: a Better Medical Adviser Learns from Better Instructions

    Full text link
    Generative large language models (LLMs) have shown great success in various applications, including question-answering (QA) and dialogue systems. However, in specialized domains like traditional Chinese medical QA, these models may perform unsatisfactorily without fine-tuning on domain-specific datasets. To address this, we introduce MedChatZH, a dialogue model designed specifically for traditional Chinese medical QA. Our model is pre-trained on Chinese traditional medical books and fine-tuned with a carefully curated medical instruction dataset. It outperforms several solid baselines on a real-world medical dialogue dataset. We release our model, code, and dataset on https://github.com/tyang816/MedChatZH to facilitate further research in the domain of traditional Chinese medicine and LLMs.Comment: 7 pages, 3 figure

    PETA: Evaluating the Impact of Protein Transfer Learning with Sub-word Tokenization on Downstream Applications

    Full text link
    Large protein language models are adept at capturing the underlying evolutionary information in primary structures, offering significant practical value for protein engineering. Compared to natural language models, protein amino acid sequences have a smaller data volume and a limited combinatorial space. Choosing an appropriate vocabulary size to optimize the pre-trained model is a pivotal issue. Moreover, despite the wealth of benchmarks and studies in the natural language community, there remains a lack of a comprehensive benchmark for systematically evaluating protein language model quality. Given these challenges, PETA trained language models with 14 different vocabulary sizes under three tokenization methods. It conducted thousands of tests on 33 diverse downstream datasets to assess the models' transfer learning capabilities, incorporating two classification heads and three random seeds to mitigate potential biases. Extensive experiments indicate that vocabulary sizes between 50 and 200 optimize the model, whereas sizes exceeding 800 detrimentally affect the model's representational performance. Our code, model weights and datasets are available at https://github.com/ginnm/ProteinPretraining.Comment: 46 pages, 4figures, 9 table

    A general Temperature-Guided Language model to engineer enhanced Stability and Activity in Proteins

    Full text link
    Designing protein mutants with high stability and activity is a critical yet challenging task in protein engineering. Here, we introduce PRIME, an innovative deep learning approach for the zero-shot prediction of both protein stability and enzymatic activity. PRIME leverages temperature-guided language modelling, providing robust and precise predictions without relying on prior experimental mutagenesis data. Tested against 33 protein datasets, PRIME demonstrated superior predictive performance and generalizability compared to current state-of-the-art modelsComment: arXiv admin note: text overlap with arXiv:2304.0378

    EbMYBP1, a R2R3-MYB transcription factor, promotes flavonoid biosynthesis in Erigeron breviscapus

    Get PDF
    Erigeron breviscapus, a traditional Chinese medicinal plant, is enriched in flavonoids that are beneficial to human health. While we know that R2R3-MYB transcription factors (TFs) are crucial to flavonoid pathway, the transcriptional regulation of flavonoid biosynthesis in E. breviscapus has not been fully elucidated. Here, EbMYBP1, a R2R3-MYB transcription factor, was uncovered as a regulator involved in the regulation of flavonoid accumulation. Transcriptome and metabolome analysis revealed that a large group of genes related to flavonoid biosynthesis were significantly changed, accompanied by significantly increased concentrations of the flavonoid in EbMYBP1-OE transgenic tobacco compared with the wild-type (WT). In vitro and in vivo investigations showed that EbMYBP1 participated in flavonoid biosynthesis, acting as a nucleus-localized transcriptional activator and activating the transcription of flavonoid-associated genes like FLS, F3H, CHS, and CHI by directly binding to their promoters. Collectively, these new findings are advancing our understanding of the transcriptional regulation that modulates the flavonoid biosynthesis

    Estimating Soil Water Characteristic Curve from soil Physical-Chemical properties in Alluvial Plain

    No full text

    Modeling of Adaptive Cyber Physical Systems using Aspect-oriented Approach

    No full text
    Abstract: This paper proposes an aspect-oriented approach to modeling adaptive cyber physical system (CPS) using Petri nets. The core concerns of CPSs are described as device model and task model, and dynamic variations of system behaviors or environment conditions are extracted as crosscutting concerns. The models of runtime inspection as well as device adaptation and task adaptation are designed as aspects nets. For the device adaptation strategy, fault types are analyzed and the control loop concept is integrated to form the adaptation aspect model. For the task adaptation, a rescheduling method using PSO-Pareto algorithm to find the best solution of the backup devices is proposed. Via well-defined rules, these aspect nets can be weaved with the core concern nets into a comprehensive adaptive CPS model. By theoretical analysis and a case study, we show the modeling approach is feasible and flexible, which simplifies the design of adaptive CPSs

    Modeling of Adaptive Cyber Physical Systems using Aspect-oriented Approach

    No full text
    This paper proposes an aspect-oriented approach to modeling adaptive cyber physical system (CPS) using Petri nets. The core concerns of CPSs are described as device model and task model, and dynamic variations of system behaviors or environment conditions are extracted as crosscutting concerns. The models of runtime inspection as well as device adaptation and task adaptation are designed as aspects nets. For the device adaptation strategy, fault types are analyzed and the control loop concept is integrated to form the adaptation aspect model. For the task adaptation, a rescheduling method using PSO-Pareto algorithm to find the best solution of the backup devices is proposed. Via well-defined rules, these aspect nets can be weaved with the core concern nets into a comprehensive adaptive CPS model. By theoretical analysis and a case study, we show the modeling approach is feasible and flexible, which simplifies the design of adaptive CPSs
    corecore